Skip to content

Graphene wetting benchmark#333

Open
dwl38 wants to merge 3 commits intoddmms:mainfrom
dwl38:graphene_wetting_benchmark
Open

Graphene wetting benchmark#333
dwl38 wants to merge 3 commits intoddmms:mainfrom
dwl38:graphene_wetting_benchmark

Conversation

@dwl38
Copy link

@dwl38 dwl38 commented Jan 31, 2026

Pre-review checklist for PR author

PR author must check the checkboxes below when creating the PR.

Summary

Added a new benchmark based on the adsorption energy curves of a single water molecule (at various orientations) on a sheet of graphene (under various strain conditions), which is useful for understanding nanoscale wetting. Reference calculations based on PBE functional, calculated using FHI-aims on "intermediate" settings.

Three metrics of equal weight:

  1. MAE of all single-point calculations (across all orientations, strains, and distances)
  2. MAE of binding energies (across all orientations and strains), by comparing fitted adsorption energy curves
  3. MAE of binding lengths (across all orientations and strains), from the same method

Linked issue

Resolves #292

Progress

  • Calculations
  • Analysis
  • Application
  • Documentation

Testing

Benchmark tested on all currently-implemented models, i.e.:

  • mace-mp-0a
  • mace-mp-0b3
  • mace-mpa-0
  • mace-omat-0
  • mace-matpes-r2scan
  • orb-v3-consv-inf-omat
  • pet-mad

with no issues. At current time of writing, mace-mp-0a performs the best on all metrics, whereas pet-mad completely fails to produce physically-reasonable adsorption energy curves.

New decorators/callbacks

Added a new "plot_from_scatter" callback, which implements almost identical functionality to "struct_from_scatter" except that it renders a Plotly Graph object instead.

@dwl38
Copy link
Author

dwl38 commented Jan 31, 2026

Database file:
graphene_wetting_under_strain.zip

@joehart2001
Copy link
Collaborator

joehart2001 commented Feb 3, 2026

Looking super good overall! The key thing i've noticed is the processed_data function in analysis. The issue with this is that as we have the decorators inside the funciton, if one model has no data, the script will fail. so i think we need to think about how to split it up so that its more robust

update: this is an us problem not a you problem

@ElliottKasoar ElliottKasoar added the new benchmark Proposals and suggestions for new benchmarks label Feb 6, 2026
@ElliottKasoar ElliottKasoar self-requested a review February 18, 2026 15:49
Copy link
Collaborator

@ElliottKasoar ElliottKasoar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, it's looking really nice! I've left a few mostly minor comments.

My main suggestions/questions revolve around the app callbacks. I think what do have works really well, but if possible it would be great to integrate any modifications you need into the existing helper functions we have.

Also, in terms of visualisation, I wonder how tricky it would be to view the structure as a trajectory vs distance, a bit like our live NEBs example - so if you click on a point, it shows the structure at that distance, but you can also play the trajectory as the distance changes.

Comment on lines +60 to +63
global DATABASE_INFO_SAVED
if not DATABASE_INFO_SAVED:
OUT_PATH.mkdir(parents=True, exist_ok=True)
database_info_path = OUT_PATH / "database_info.yml"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you not just check if (OUT_PATH / "database_info.yml").exists() ?

water_energy = atoms.get_potential_energy()

# Iterate through strain conditions
for strain in strains:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fairly minor since it's so short, but it might be nice to use tqdm for this, just so we can see progress

For slower models, it can take a minute or more, and since it's very reasonable to run this locally, that might be concerning

Comment on lines +200 to +219
@plot_scatter(
filename=OUT_PATH / model / f"figure_{orientation}_{strain}.json",
title=f"{orientation} binding energy curve ({strain[1:5]}% strain)",
x_label="Distance / Å",
y_label="Adsorption energy / meV",
show_line=True,
)
def plot_model_binding_energy_curve(
model, orientation, strain
) -> dict[str, tuple[list[float], list[float]]]:
return {
"ref": (
results["distances"],
results["ref"][orientation][strain]["energies"],
),
model: (
results["distances"],
results[model][orientation][strain]["energies"],
),
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably move this and the other plots out of this function, probably just to a top-level function that is called here.

You may still need to wrap it in a function to parameterise the plot_scatter decorators, but it makes processed_data a bit less unwieldy

Comment on lines +337 to +343
for i in range(len(processed_data["distances"])):
deviations.append(
abs(
processed_data[model][orientation][strain]["energies"][i]
- processed_data["ref"][orientation][strain]["energies"][i]
)
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be rewritten to use mae from analysis.utils? This is absolutely fine, but we intend at some point to make changes such that RMSE etc. are computed alongside MAE in a swappable way, and so it would be simpler if we're able to swap about a single function.

)
)
results[model] = np.nan_to_num(
np.mean(deviations), nan=99999, posinf=99999, neginf=99999
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, if a model fails I'd probably return None. We have plans to translate something like np.inf to the 0 score, but None would probably be most consistent, if it works ok here.

)
)
results[model] = np.nan_to_num(
np.mean(deviations), nan=999, posinf=999, neginf=999
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above

Comment on lines +3 to +18
good: 40.0
bad: 1000.0
unit: meV
weight: 1.0
tooltip: Mean Absolute Error across all orientations, distances, and strains
level_of_theory: PBE
Binding Energies MAE:
good: 40.0
bad: 1000.0
unit: meV
weight: 1.0
tooltip: Mean Absolute Error of binding energies across all orientations and strains
level_of_theory: PBE
Binding Lengths MAE:
good: 0.0
bad: 1.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you thought about these good and bad thresholds (genuine question)?

1 eV seems quite large to me

)


def struct_from_scatter_custom(scatter_id, struct_id, structs):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the normal struct_from_scatter function not do that you need?

)


def plot_and_struct_from_scatter(scatter_id, plot_id, plots_list, struct_id, structs):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand why this is needed in addition to the other callbacks we have/you have added?

Comment on lines +81 to +85
return (
Div("Click on a metric to view plot."),
Div("Click on a metric to view plot."),
Div("Click on a metric to view the structure."),
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want all of these displayed, and what does this do that we can't do with existing callback helpers?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new benchmark Proposals and suggestions for new benchmarks

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Graphene Wetting Under Strain

3 participants